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Abstract. In a society of completely selfish individuals where every- 
body is only interested in maximizing his own payoff, does any equilib- 
rium exist for the society? John Nash proved more than 50 years ago 
that an equilibrium always exists such that nobody would benefit from 
unilaterally changing his strategy. Nash Equilibrium is a central concept 
in game theory, which offers a mathematical foundation for social sci- 
ence and economy. However, it is important from both a theoretical and 
a practical point of view to understand game playing where individuals 
are less selfish. This paper offers a constructive generalization of Nash 
equilibrium to study n-person games where the selfishness of individuals 
can be defined at any level, including the extreme of complete selfishness. 
The generalization is constructive since it offers a protocol for individ- 
uals in a society to reach an equilibrium. Most importantly, this paper 
presents experimental results and theoretical investigation to show that 
the individuals in a society can reduce their selfishness level together to 
reach a new equilibrium where they can have better payoffs and the so- 
ciety is more stable at the same time. This study suggests that, for the 
benefit of everyone in a society (including the financial market), the pur- 
suit of maximal payoff by each individual should be controlled at some 
level either by voluntary good citizenship or by imposed regulations. 



1 Introduction 

John Nash has proved in 1950 [1 using Kakutani fixed point theorem that any 
n-player normal-form game [5] has at least one equilibrium. In the game, each 
player has only a finite number of actions to take and takes one strategy at action 
playing. If a player takes one of the actions in a deterministic way, it is called a 
pure strategy. Otherwise, if a player takes anyone of the actions following some 
probability distribution defined on the actions, it is called a mixed strategy. At 
a Nash equilibrium, each player has chosen a strategy (pure or mixed) and no 
player can benefit by unilaterally changing his or her strategy while the other 
players keep theirs unchanged. 

Nash Equilibrium is arguably the most important concept in game theory, 
which has significant impacts on many other fields like social science, economy, 



and computer science. It is an important theory for understanding a common 
scenario in game playing. 

In a Nash equilibrium, each player's strategy is completely selfish because the 
player is only interested in maximizing his own payoff. Only the best action(s) 
to each player is accepted by the player, sub-optimal actions are not considered 
at all. The best action is defined as the one with the highest payoff. As a con- 
sequence of the selfishness, even if the payoff of a sub-optimal action is slightly 
less than the best one, the probability of picking this sub-optimal action by the 
player is still zero. 

However, many cultures teach people to be less selfish in a society. Also, 
the scenario of less-selfish players may be closer to reality, such as individuals 
in human societies or animal kingdoms. Our conventional wisdom tells us that 
if each of us gives away a bit more in favor of others, we could end up with 
more gains as return. That is, reduced selfishness leads to better payoffs for 
the individuals in a society. For instance, if we, as drivers, respect other drivers 
sharing the same road and give considerations for each other either voluntarily 
and/or by following traffic laws, then each of us will end up with a faster, safer 
drive to his/her destination than the case when everyone is only interested in 
maximizing his own speed to his destination. 

This paper presents both experimental results and theoretical investigation 
to show that, if the individuals in a society reduce their selfishness by simply 
accepting sub-optimal actions in some degree, a new equilibrium can be reached 
where better payoffs and social stability are obtained at the same time. 

The first key observation of this paper is that, reducing selfishness can im- 
prove payoffs. When completely selfish players at a Nash equilibrium reduce their 
selfishness, they will shift to a new equilibrium with payoffs possibly better than 
the original one. The observations will range from the classic prisoner's dilemma, 
a hard game used in other game theory literatures, to computer generated games 
with hundreds to thousands of players. It verifies the conventional wisdom that 
reducing selfishness could lead to better payoffs for everyone. 

The second key observation is that, reducing selfishness can also improve 
social stability. A society of completely selfish individuals can be very sensitive 
to perturbations, the accuracy at representing individuals' utility functions, and 
communication errors among the individuals in the society. The smallest change 
in utility function or the slightest communication error could knock the indi- 
viduals out of their existing equilibrium. Furthermore, a society of completely 
selfish individuals can have an enormous number of equilibria. The number may 
increase exponentially with the population of the society, The society could end 
up with one Nash equilibrium or another, depending on the initial conditions 
and sensitive to perturbations. If the individuals reduce their selfishness together, 
they can reduce their sensitivity to perturbations, inaccuracy in utility functions, 
and communication errors. At the same time, the number of equilibria tends to 
drop significantly so that the outcome of the society can be more predictable. 
When the selfishness is below a certain level, the society tends to have only one 
equilibrium and converges to it with any initial conditions. 
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In particular, this paper gives a mathematical model for describing selfish- 
ness. The level of selfishness is controlled by one parameter of the model to cover 
the spectrum ranging from complete selfishness to complete selfishlessness. With 
the parameterized selfishness model, this paper offers a generalization of Nash 
equilibrium together with a proof of the existence of an equilibrium given any 
selfishness level using a fixed point theorem. It is a generalization because this 
paper offers a proof to show that a generalized equilibrium at the particular case 
of completely selfish players falls back to a Nash equilibrium. In other words, the 
definition of Nash equilibrium is a special case of the generalized one. It is impor- 
tant to note that the generalization is constructive because it defines a protocol 
for the players in a game to interact with each other so that an equilibrium can 
be reached with any selfishness level. 

2 A Constructive Generalization 

An n-playcr normal form game is defined as: 

— n players 1, 2, . . . , n; 

— Each player i has a finite set of strategies Si = {sa, Sj2, • • • , Si mi }. Strategies 
are also called actions. The Cartesian products of Si, S = Si x S2 x • • • x S n , 
is called the set of the pure strategy profiles (the set of action tuples) . 

— Each player has a utility function defined as a real value function Ui(x) 
defined on the set of the pure strategy profiles S, i.e., Uj(x) : S — ► R (a 
mapping from each action tuple to a real value). 

If player i takes one of the actions from Si in a deterministic way, it is called 
a pure strategy. Otherwise, if the player takes any of the actions following some 
probability distribution pi defined on the action set Si, it is called a mixed 
strategy. That is, for each action Xi 6 Si, the player i takes this action with 
a probability Pi(xi). A set of (mixed) strategies {pi,P2, ■ ■ ■ ,Pn}, onc f° r each 
player, is called a (mixed) strategy profile p. 

Assume that the n players take a strategy profile p. Then the payoff of player 
i is defined as 



The objective of each player is to maximize his payoff. 

A strategy profile excluding the one for player i is denoted as p-i. A strategy 
profile p* is a Nash equilibrium if for all i and for all pi, 



That is, no unilateral deviation in strategy by any player gives higher payoff for 
that player. Nash's 1950 PNAS paper proves the existence for an equilibrium for 
any finite n-player game using Kakutani's fixed point theorem. 




for i = 1, 2, . . . , n . 



Ui{Pi,P*_i) < Ui(p*,p*_i) ■ 
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In the following discussions, without loss of generality, we assume all utility 
functions are of positive function values, i.e., Ui{x) > 0, for any x £ S. 

If player i takes an action Xi £ Si in response to other players strategies p~i, 
the payoff is Ui{xi,p-i). The optimal action x* for the player i is defined as the 
one with the highest payoff, i.e., 



Obviously, 



Ui(x*,p-i) = maxui(xi,p-i) 

Xi£Si 



Ui(pi,P-i) = ^ Pi(Xi)u % {Xi,P- 



One of the important properties of a Nash equilibrium p* is that only the 
optimal action(s) has non-zero probability, i.e., if p*(xi) > 0, then Xi must be 
the optimal action for the player i. In other words, player i is completely selfish 
because he only accepts the optimal action for himself. 

Assume that, at a time instance t, the strategy profile excluding the one for 
player i is p-i(t), the action payoff for player i is Ui(xi,p-i(t)), for Xi 6 Si. Based 
on the above observation, we can define a mathematical model to formulate the 
construction of the next time strategy pi{xi,t + 1) for player i based on his 
action payoff Ui(xi,p—i(t)) at the current time t. Specifically, we can construct 
Pi(xi,t + 1) such that it is proportional to Ui(Xi,p-i(t)), e.g., 

Pi(xi,t+ 1) cx (ui(xi,p-i(t))) a , for Xi e Si , 

where a is a parameter of a non- negative value. Since Pi(xi,t + 1) should be 
normalized as a probability, the above formula can be rewritten as 

Pl (x, l ,t+l)= Mxi,P-i(t))r {0Ti=h2 ,...,n. (1) 

LxiSSi {Ui(Xi,P-i{t))) 

In |T]), when a — > oo, the best action has a non-zero probability while oth- 
ers have probability zero. That is, the player only accepts the best action, the 
one with the highest payoff m(xi,p-i). It is exactly same as the case of Nash 
equilibrium described before. 

If the value of a is reduced from the above extreme case, the player i starts to 
accept sub-optimal actions by assigning non-zero probability to them. The degree 
of the acceptance increases with further decrease of a. At another extreme case, 
when a — » 0, each action is assigned with the same probability and the player has 
no preference on any one of the actions. All of the actions are treated equally and 
they are sampled uniformly. In this case, the player is completely selfishless. In 
summary, the parameter a describes the selfishness level of player i. It covers the 
spectrum ranging from complete selfishness (a — * oo) to complete selfishlessness 
(a = 0). 

In the special case of a — > oo, the game playing defined by the constructive 
generalization |T]) is the same in principle as fictitious play introduced by G.W. 
Brown in 1951 [3 J. In fictitious play, each player takes the optimal action(s) in 
respond to the strategies of other players. 



4 



Definition 1. Given a non-negative real value for the selfishness level a, i.e., 
a > 0. // the iterative computation defined by 0) reaches an equilibrium, that 
is, there is a strategy profile p* satisfying 

./ \ ("«(a=«»P-i))) a f ■ -, o 
Pi[Xi) = — -— —a for i = 1,2,..., n , 

i/ien f/ie strategy profile p* is called a generalized equilibrium. 

In parallel with Nash's 1950 PNAS paper, the proof of the existence of a gen- 
eralized equilibrium given any selfishness level is provided below. Furthermore, 
it will be shown that when the selfishness level is sufficiently high, a generalized 
equilibrium falls back to a Nash equilibrium. 

Theorem 1. A generalized equilibrium p* defined by (0) exists for any n -player 
normal form game with any selfishness level a of a non-negative value (a > 0). 
It is still true even if each player i in the game has his own selfishness level 014, 
possibly different from the rest. 

Proof. The set of iterative equations fl} defines a mapping from the strategy 
profile set to itself. Because the set is compact and the mapping is continuous, 
so a fixed point exists based on Brouwer fixed point theorem. 

The second part of this theorem tells us that, for any n-player normal form 
game, even if the selfishness level is different from player to player, a generalized 
equilibrium still exists for the game. 

It is important to note that |(5J) defines a system of polynomials if a is an 
integer. If it is also an even number, then any real value solution to this system, 
which must also be a positive solution, is also a generalized equilibrium for 
the game playing. Also, the game playing defined by (JTJ) can be treated as an 
iterative, direct method to find an equilibrium of the game playing. It defines 
a protocol for the players in a game to interact with each other so that an 
equilibrium can be reached with any selfishness level. Following this protocol, 
each player only needs to know his own utility function and the strategies of 
other players at the current time to compute his strategy for the next time. The 
strategies of other players can be obtained through either statistical learning or 
message passing among the players. 

Theorem 2. When the selfishness level a is sufficiently large, i.e., a — > 00, 
any generalized equilibrium defined by 0) can be arbitrarily close to a Nash 
equilibrium and vice versa. 

The proof is given in the subsection 15.11 in the Appendix. 

As a consequence, any real value solution to the system of polynomials defined 
by © with a large even number for a can be served as a good approximation 
to a Nash equilibrium. Alternatively, the game playing defined by ([T]) with a 
sufficiently large a can be applied directly to reach an equilibrium which can 
also be served as a good approximation of a Nash equilibrium. 
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When a — » oo, from ([T]) we can see that Pi(xi, t + 1) is no longer a continu- 
ous function of it, (xj, In this case, any mixed strategy can be extremely 
unstable for the slightest change in ttj(xj,p_j(t)) caused by the inaccuracy at 
representing the utility functions, the variation of the utility functions, any per- 
turbation and communication error among the players. For example, a small 
variation in the utility function could lead to a dramatic shift of the equilib- 
rium from one point in the strategy profile space to another one. It is hard for 
an algorithmic method to converge to an unstable equilibrium purely based on 
iterations. 

Even if a game of completely selfish players can reach an equilibrium, it may 
have an enormous number of equilibria, possibly growing exponentially with 
the number of the players of the game. The players could get stuck into one 
Nash equilibrium or another, depending on the initial conditions and sensitive 
to perturbations. How to reach an equilibrium which gives relatively good overall 
payoff for the game becomes a challenging problem (the overall payoff for a game 
is defined as the summation of the players' payoffs, i.e., £\ u i( x i t))- 

As a summary, we can say that complete selfishness of the players in a game 
may lead to the difficulty for the players to reach an equilibrium. Even if an equi- 
librium is found, it could also be unstable, sensitive to perturbations, sensitive 
to inaccuracy or variations in utility functions, and vulnerable to communica- 
tion errors. Furthermore, the overall payoff of the game may be ignored due to 
the fact that each player only tries to maximize his own payoff. It is desirable to 
improve the overall payoff for a society because it stands for improved individual 
payoff on average. Also, everyone in the society could benefit from the improved 
overall payoff if some social welfare system is implemented to redistribute the 
social wealth. Can we improve the overall payoff and the stability of a game 
playing by simply reducing the selfishness level of the players in the game? 

Both experimental result in the following section and a theoretical investiga- 
tion in Appendix (subsection 15. 2[) will affirm the above question. The theoretical 
investigation shows that the game playing defined by the constructive gener- 
alization ([I]) is a variation of a global optimization algorithm [8] defined by 
a multi-agent system. When the value of the parameter a is reduced below a 
certain threshold, the global optimization algorithm has one and only one equi- 
librium and converges to it with an exponential rate. If the equilibrium is also a 
consensus one among all the agents, then it must be the global optimum, guar- 
anteed by theory. Above the threshold, the number of equilibria of the global 
optimization algorithm may grow with the value of the parameter a. That is, the 
algorithm becomes less stable with the increase of a, but the chance of reaching 
a consensus increases, however. 

The theory suggests that reducing the selfishness level from the extreme 
of complete selfishness can stabilize the game playing and possibly improve the 
overall payoff. The experiments in the next section verifies that the overall payoff 
for many games is best in a statistical sense at a certain level of selfishness, 
neither at the complete selfishness one nor at the complete selfishlessness one. 
Applying this to social situations, it suggests that a society should let some level 
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of selfishness remain in its individuals. Otherwise, nobody has any motivation 
to pursue better payoffs. Also, it is not recommended to take the other extreme 
where everyone is completely selfish. However, how to find the best selfish level 
for any game remains as an open question. 



3 Experimental Results 

The prisoner's dilemma constitutes a basic problem in game theory. It is a typical 
non-zero-sum game in which two players can either "cooperate" or "defect" the 
other player. In this game, the only concern of each individual player ("prisoner") 
is to maximize his/her own payoff. Regardless of what the opponent chooses, each 
prisoner always receives a higher payoff by defecting; i.e., defecting is the strictly 
dominant strategy. Therefore, the only possible Nash equilibrium for the game 
is for all prisoners to defect. 

An example payoff matrix of the prisoner's dilemma is given as follows: 

Cooperate Defect 

Cooperate 



Defect 



3,3 


1,4 


4,1 


2,2 



At the Nash equilibrium (the element in the matrix with a bold font), the 
payoffs of the two players are (2,2). It corresponds to the case when the selfishness 
level a = oo. When the two players reduce their selfishness level together, their 
payoffs at equilibria also increase together as shown in Figure [TJ Those equilibria 
are found by the constructive generalization with different selfishness levels. 



2.5 




1.9 -I 1 1 1 1 , 1 

5 10 15 20 25 30 

Selfishness Level 



Fig. 1. Payoffs of prisoner's dilemma under different selfishness levels. 
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From Fig. Q] we can see that when the two players have the same selfishness 
level and the level is of a high value (a = 30), their payoffs are close to those of 
the Nash equilibrium. The moment that the both players reduce their selfishness 
level, both get better payoffs than those of the Nash equilibrium. When the 
selfishness level reduces to one (a = 1), the payoffs are close to 2.4 for both, a 
20% increases over the one of the Nash equilibrium. 

The result seems counterintuitive because if one player could update his 
strategy to improve his payoff, he should go ahead to do it in order to receiving 
a better payoff. However, in many cases, all the players in a game are inter- 
connected. The gain of one player often leads to the loss of other players. If 
everyone yields back a little bit of his payoff as a favor to others, everyone can 
end up with better payoff as a returned favor from others instead. 

To show the power of the constructive generalization at finding Nash equi- 
libria, a 2-player game is used with the following payoff matrix: 



/2, 3 


-1, 4 


2, 4 


5, 2 




2, 2 


3, 


4, 1 


-2, 4 


1, 3 


4, 6 


7, 2 


2,-2 


4, 9 


2, 1 


9, 


-2, 6 


6, 3 


7, 


0, 5 


\3, 2 


6, 1 


2, 5 


5, 3 


1, 0/ 



This game has been used in other game theory literatures as a hard game 
because it has only one mixed Nash equilibrium. The strategy for the row 
player is (0,0, jr, jj, ^-) with the payoff 4. The strategy for the column player 
is (0, |,|,|,0) with the payoff 3. This mixed Nash equilibrium is extremely 
unstable. Assume that the two players play the game by taking only the best 
action. Assume further that the column player couldn't represent fraction num- 
bers. Instead, the player uses real values to approximate them, just like the real 
values stored in most computers. Then, a very slight round-off error for the value 
| , say 0.4285714285714285714285714286, could knock the row player off of his 
Nash equilibrium strategy to the new one (0, 0, 0, 1, 0), which will in turn knock 
the column player off his Nash equilibrium strategy. As a consequence, both of 
them will immediately be knocked off the Nash equilibrium and get stuck into 
a chaotic situation. Therefore, this game is hard for an iteration-based direct 
method to reach the unique mixed Nash equilibrium. 

Despite of its hardness, the constructive generalization (fTJ) as an iteration- 
based direct method can find a very good approximation to the Nash equilibrium. 
To improve the convergence property of the method, an additional step is added 
after computing pi(xi,t) defined by (fT]) to smooth out its fluctuation. It is done 
by keeping some memory of the previous value of Pi(xi, t), i.e., 

Xpi(xi,t + 1) + (1 - X)pi(xi,t) —>pi(xi,t+ 1) , 

where A = 0.001 was used in the experiment. 

Furthermore, to increase the chance for the constructive generalization to 
reach an equilibrium at a high selfishness level a, the value of a is progressively 
raised from a small value, say 1. When it reaches the value 1000, the payoff for 
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the row player is 4.0068, a difference around 0.17% to the payoff 4 of the Nash 
equilibrium. His strategy (left) is very close to the Nash equilibrium one (right) 
as shown below: 



(0 — — — ) w (0,0 — — — ) 
{ ' ' 11 ' 11 ' 11 ' 1 ' ' 11' 11' 11 j ■ 

The payoff for the column player is 3.0001097, a difference around 0.0037% to 
the payoff 3 of the Nash equilibrium. His strategy (left) is very close to the Nash 
equilibrium one (right) as shown below: 

(o,if^,^,o)* ( o,f,5,H,o). 



Fig. [2] shows the changes of the payoffs of the two players in relation to the 
selfishness level. The payoff of the row player is peaked around a = 8 with the 
value 4.77572, The payoff of the column player is peaked around a — 5 with 
the value 3.30341. At a = 7, the payoffs for both are (4.7586,3.2527), a 19% 
improvement for the row player and a 8.4% improvement for the column player 
over the payoffs (4, 3) of the Nash equilibrium. 
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-I 1 1 1 , 1 1 1 1 1 1 

10 20 30 40 50 60 70 80 90 100 

Selfishness Level 



Fig. 2. Payoffs of two players with 5 actions under different selfishness levels. 



To illustrate the power of the constructive generalization ([T]) at stabilizing 
game playing, a 2-players game with 6 actions for each is constructed with the 
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following payoff matrix: 



/6, 6 


1, 1 


1- 


1 


1, 1 


1, 


1 


1, 1 \ 


1, 1 


6.5, 6.5 


1, 


1 


1, 1 


1. 


1 


1, 1 


1, 1 


1, 1 


7. 


7 


1, 1 


1. 


1 


1, 1 


1, 1 


1, 1 


1, 


1 


7.5, 7.5 


1, 


1 


1, 1 


1, 1 


1, 1 


1. 


1 


1, 1 


8, 


8 


1, 1 


U 1 


1, 1 


1. 


1 


1, 1 


1, 


1 


8, 8.5 j 



Clearly, this game has six actions for each player and six pure Nash equilibria. 
Let us label the actions for each player as 1,2,3,4,5,6. If any player picks the 
ith action in random, the other will take the same action as the best response. 
As a consequence, a Nash equilibrium is thus found. 

With this best-response playing, the average payoff for each player is 7.725, 
the variance of the payoff is 35/48 ~ 0.729. Three hundred generalized equilibria 
are found using the constructive generalization JTJ) with the selfishness levels 
a = 100,4,2, 1, respectively. The results for a — 100,4,2 are shown in Fig. [31 
Fig- HI and Fig.[5]respectively. From the figures we can see that the stability of the 
game playing defined by the constructive generalization improves progressively 
as the selfishness level a decreases. Here, the stability is reversely proportional 
to the variance of the payoff. When a = 1, the game playing always converges to 
a unique equilibrium with the payoff=2. 10373 after three hundred runs. That is, 
the game playing tends to have only one equilibrium when the selfishness level 
drops below a certain threshold. Also we can see from the three figures that the 
average payoff of each player is of the highest value when a = 4. 

From this example, we can see that the constructive generalization yields the 
best payoffs for the players in a game at a certain selfishness level. The stability 
of the game playing continuously improves as the selfishness level reduces. That 
is, reducing the selfishness level can always improve the stability of the game 
playing. However, the players in a game can only get the highest payoffs at 
a statistical sense at a certain selfishness level (In the subsection 15.21 in the 
Appendix, a theoretical investigation is given to offer some explanation). 

In the following set of experiments, computer-generated societies with a pop- 
ulation ranging from hundreds to a thousand are used to demonstrate the im- 
provement of payoffs and stability by reducing the selfishness level. In each soci- 
ety, each individual has a number of neighbors and his payoff function is defined 
by the summation of the pairwise joint actions of himself and his neighbors as 
follows 



where Af(i) is the set of the individual i's neighbors. The overall payoff of the 
society is defined as 




(3) 
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1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 
Equilibrium Number 



Fig. 3. Overall payoffs of 300 generalized Nash equilibria with the selfishness 
level a = 100. The average fi — 7.728 and variance a 2 — 0.502. 




1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 
Equilibrium Number 



Fig. 4. Overall payoffs of 300 generalized Nash equilibria with the selfishness 
level a = 4. The average fj, = 7.910 and variance a 2 — 0.379. 
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8.5 



7.5 




5 I I 

1 14 27 40 53 66 79 92 105 118 131 144 157 170 183 196 209 222 235 248 261 274 287 300 
Equilibrium Number 



Fig. 5. Overall payoffs of 300 generalized Nash equilibria with the selfishness 
level a = 2. The average /x = 6.888 and variance a 2 = 0.193. 



Each function value fij(xi,Xj) is uniformly sampled from the interval [0, 1]. 
The neighbors of each individual are randomly picked from the entire population. 

In the first experiment, an instance of a society of 121 individuals is generated 
where each one has 50 actions and 6 neighbors on average. 300 Nash equilbria 
are discovered by fictitious play and 300 generalized ones are discovered by the 
constructive generalization with the selfishness level a = 20. Fig. [5] shows the 
overall payoff's of the first 300 ones versus the second 300 ones. From the figure we 
can see that, reducing the selfishness level can lead to remarkable improvement 
both in payoffs and stability. 

In the second experiment, the population is increased to 601, the number 
of actions per person is reduced to 20, and the size of neighbors on average is 
increased to 30. Figure [7] shows the overall payoffs of 300 Nash equilibria versus 
the 300 generalized ones with the selfishness level = 20. From the figure we can 
see that, reducing the selfishness level can lead to remarkable improvement both 
in payoffs and stability with a larger population. 

In the third experiment, the population is increased further to 1001, the 
number of actions per person is reduced to 10, and the size of neighbors on 
average is increased to 50. From Figure [5] we can make the same conclusions as 
above with an even larger population. 

The last three experiments with societies of different population sizes are 
extended with more selfishness levels. The average overall payoff and the fluc- 
tuation of the overall payoff of a society with different selfishness levels a are 
shown in the following table. The fluctuation is indicated by the variance of the 
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Equilibrium Number 

Fig. 6. Overall payoffs of 300 Nash equilibria (bottom) versus 300 generalized 
ones (top, the selfishness level a — 20) for a society of 121 individuals. For 
the former, the average /i = 600.67 and variance a 2 = 17.5. For the latter, the 
average /i = 622.60 and variance a 2 = 10.7. 
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Equilibrium Number 

Fig. 7. Overall payoffs of 300 Nash equilibria (bottom) versus 300 generalized 
ones (top, the selfishness level a = 20) for a society of 601 individuals. For 
the former, the average /i = 11766 and variance a 2 — 1009. For the latter, the 
average /i = 11899 and variance a 2 = 392. 
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Fig. 8. Overall payoffs of 300 Nash equilibria versus 300 (bottom) generalized 
ones (top, the selfishness level a = 30) for a society of 1001 individuals. For 
the former, the average fi = 30274 and variance a 2 = 3335. For the latter, the 
average [i = 30677 and variance a 1 = 818. 

overall payoff given a selfishness level. The less fluctuation a society has, the 
more stable the society is. 
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From the above table, we can see that the overall payoffs of the three societies 
improve progressively with the reduction of the selfishness level a started from 
a = oo (complete selfishness). Each society yields the highest overall payoff 
at a some selfishness level and degrades progressively with further reduction 
of the selfishness level. The stability of each society continuously improves as 
the selfishness level reduces. That is, reducing the selfishness level can always 
improve the stability of a society. This experiment shows us that a less selfish 
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society can be better in overall payoff and stability than a completely selfish 
society. 

A less selfish society can also be more efficient than a completely selfish 
society. The efficiency of a society can be measured by the capability at finding 
a good equilibrium in terms of the overall payoff. To compare the efficiency the 
same society of a population of 121 described before is used in the experiment. 
When the individuals in the society are less selfish (a = 20), the average overall 
payoff of the 300 equilibria found by the society is 622.60 (see also Fig. [5]). 
When all the individuals become completely selfish, after exploring one million 
of equilibria by the society, the best overall payoff is of a value 621.5, less than 
the former one 622.60. This result says that the average payoff of the less selfish 
society in a generalized equilibrium is better than the best payoff out of those 
of one million Nash equilbria explored by the completely selfish society. The less 
selfish society spent seconds on average to find an equilibria while the completely 
selfish society took almost a whole day to find the one million equilibria using a 
laptop with a AMD Turion™X2 Dual-Core Mobile Processor and 3GB RAM. 
The less selfish society is several orders of magnitude more efficient than the 
completely selfish society. 

Fig. [5] shows the improvement of the best overall payoff with the increase of 
the number of equilibria discovered by the completely selfish society mentioned 
above. 
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Fig. 9. After exploring one million equilibria by a society of 121 completely 
selfish individuals, the best one in terms of overall payoff still couldn't match 
the average one (dotted line) found by the same society when all the individuals 
are less selfish. 
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4 Conclusions 



John Nash in his Nobel price-winning work defined an equilibrium and proved 
its existence for n players games where all players are completely selfish. How- 
ever, it is important from both a theoretical and a practical point of view to 
understand game playing where players are less selfish. The key contribution of 
this paper is a generalization of Nash equilibrium to cover the entire spectrum of 
selfishness ranging from complete selfishness to complete sclfishlessness. It also 
gives the proof of the existence of an equilibrium for a game of n-players with 
any selfishness level. The definition of Nash equilibrium is a special case of this 
generalization where all players are completely selfish. The generalization is con- 
structive since it offers a protocol for players in a game to reach an equilibrium. 
Most importantly, this paper presents experimental results and theoretical in- 
vestigation to show that the players in a game can reduce their selfishness level 
together to reach a new equilibrium where they can have better payoffs and the 
game playing is more stable at the same time. 
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5 Appendix 

5.1 Proof for Theorem 2 

Definitions and Notations At time instance t, let Ui(xi,p-i(t)) be the payoff 
of player i by taking action Xi in response to other players' strategies p~i(t). It 
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is a function of x% and t, called the action payoff function, denoted as ^i(xi,t). 
Obviously, we have 

&i(xi,t) = Ui(Xi,p-i(t)) = ^ Ui(x)Y\_Pj{x 3 ,t) , for any i . (4) 

Using the notation, the constructive generalization |T]) can be rewritten as 

Pi ( Xi ,t + l)= a : fori = l,2,...,n. (5) 

That is, pi(xi,t + 1) equals to the normalized (^(a^, £))". To show the rela- 
tionship, pi(xi,t) can be expressed as (^(xj, i)) with the bar standing for the 
normalization. That is, 

Pi(xi,t + I) = (§i(xi,t)) a , for z = 1,2, . . . ,n . (6) 

Substituting ([6|) into (J2}, we have an iterative update function for &i(xi,t) as 
follows 



ffij(i i ,t+l) = 5^(u i (x)JJ(«i(s i ,t)) , fart = l,2,...,n. (7) 

If a strategy profile p* is a generalized equilibrium satisfying ([2|) , then there 
is a corresponding set of action payoff functions {\P*(xi), (X2), • ■ ■ j^n( x n)} 
defined by (g|), or simply such that is satisfied. That is, 

**(*i)=E f^)II(^^)) a ) ' for* = l,2,...,n. (8) 

Both a strategy profile p* satisfying ([2]) and an action payoff function set \P* 
satisfying ([8|) can be used to represent a generalized equilibrium. Based on (j4|), 
we have 

=X! UiWlI^K^) . for i = l,2,...,n . 

~Xi \ 3 =£i J 

Based on (O, we have 

p*( Xl ) = (^*(x. ( )) Q , for i = 1,2, . . . ,n . 



The Proof The best action of player z at time £ is defined as the one with the 
highest payoff, i.e., the Xi that maximizes the action payoff function &i(xi,t). 
Assume that the total number of actions of player i is m;. Assume further that 



17 



a > 1. At a generalized equilibrium with a strategy profile p* and its correspond- 
ing action payoff function set as &* , based on ([5]) , we can find out the difference 
between the best payoff max Xi ^(xi) and the expected payoff J2 Xi &i( x i)Pi( x i)- 
It is straightforward to verify that the difference should satisfy the following in- 
equality: 

< max$f(xi) - y^&(xi)pi(xi) < (— — - max&?(xi) ] a" 1 . 
Xi * — ' v e xi J 

Obviously, the difference can be arbitrarily small when the parameter a is suffi- 
ciently large. That is, the difference is reduced to zero when a — > oo, 



lim max<P.*fe) - } {xi)p*{xi) = 0, for any i 

a — >oo \ Xi — ' J 



(9) 



Given a strategy profile p* , it is a Nash equilibrium if and only if, given any 
player, its best payoff is equal to its expected payoff J2 X ( x i)Pi( x i)- That is, 
for any i, 

max^(xi) - y]&?(xi)p*(xi) =0 . (10) 

Xi 

Compare the statement ^ with the statement (TTU|) . we can conclude that 
any generalized equilibrium (jSJ) can be arbitrarily close to a Nash equilibrium if 
the parameter a is sufficiently large. 

The other way around is also true. That is, for any Nash equilibrium, there 
exists a generalized equilibrium defined as ([8} which is arbitrarily close to the 
Nash equilibrium if the parameter a is sufficiently large. To prove this statement, 
recall that the action payoff function (a?i ) computed by ([7]) is the payoff of 
player i taking the action Xi while other players taking the strategies Pj (j i). 
Assume that a strategy profile p* is a Nash equilibrium. Then the payoff \P*{xi) 
at the Nash equilibrium should satisfy the following condition, 

max <f* (xi) = V* (x^, if P *{x l )>0; 

Xi 

Let e is a positive infinidesmal. Note that for any probability Pi, if < Pi < 1, 
then 



lim {l + e\npi) l l e = Pi 

e— 0+ 



Otherwise, if pi = 0, then 



lim (l + elne) 1/e = p;(= 0) . 
Given each player i, i — 1, 2, . . . , n, define its action payoff function 9 i (xi) 

as 

(1 + elnp*(x l )) max,, &*{xi), if p*{x t ) > ; 
(n) = { (1 + elne) max,, &*{x t ), if p*(x t ) = and max,, W*( Xl ) = &*{xi) ; 
&i(xi), if V?(xi) < max,, . 



18 



Obviously, 



lim PJxi) = <F*(xi 



Let a = 1/e, from ([5]) used for computing the strategy Pi(xi, t), we have 



lim u ; 7T77=Pi( x i)i for i = 1,2, . . . ,71 . 

Hence, the set of action payoff functions {^(xi), tf' 2 ( a; 2), • ■ ■ , ^(^n)} is a gen- 
eralized equilibrium satisfying JS]) when the parameter a is sufficiently large. Its 
corresponding strategy profile {pl(xi) , Pzfa) , ■ ■ ■ ,p* (x„)} is the strategy pro- 
file p* of the Nash equilibrium in the assumption. In other words, for any Nash 
equilibrium with a strategy profile p* , there always exists a generalized equilib- 
rium satisfying ^ which is arbitrarily close to the Nash equilibrium when the 
selfishness level a is sufficiently large. 



5.2 Theoretical Investigation 

From Cooperative Optimization to the Constructive Generalization 

The constructive generalization can be derived from a recently discovered general 
global optimization method, called cooperative optimization [5] . Cooperation is 
an ubiquitous phenomenon in nature. The cooperative optimization theory is a 
mathematical theory for understanding cooperative behaviors and translating it 
into optimization algorithms. The major theoretical results can be found in [8]. 

Let E(x\, X2, ■ ■ ■ , x n ), or simply E(x), be a multivariate objective function of 
n variables. Assume that E(x) can be decomposed into n sub-objective functions 
Ei(x), one for each variable, such that those sub-objective functions satisfying 

E x (x) + E 2 (x) + ... + E n (x) = E(x) . 

In terms of a multi-agent system, let us assign Ei(x) as the objective function 
for agent i, for i = 1,2, ... ,n. There are n agents in the system in total. The 
objective of each agent i is to maximize Ei(x). The objective of the system is to 
maximize E(x), called the global objective function. 

There is a simple form of cooperative optimization where each agent i is 
associated with a function \Pi(xi,t) defined on the variable Xi and time t. The 
function is called the assignment function for the agent. Each agent updates its 
assignment function iteratively as follows: 

V i (x i ,t)=J2 le B «*V h *[[p j (x ji t-l)] , fbrt = l,2,...,n > (11) 

where stands for the summation over all variables except Xi and h is a 

constant of a small positive value. pi{xi,t) is defined as 

Pi ( Xi ,t)= {W f^ )r , (12) 
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where a is a parameter of a non-negative real value. 

By the definition, pi [xi ,t) is a probability- like function satisfying 

y]pi(xi,t) = i . 

Xi 

It is, therefore, called the assignment probability function. It defines the soft 
decisions for assigning variable Xi at the time instance t. If a variable value Xi 
is of a higher function value Pi(xi, t), then it is more likely to be assigned to the 
i-th variable than any other value of a lower function value. 

The assignment function <Sj (x, , t) is also called the assignment state function, 
representing the state of agent i at the time instance t. From (fT2^) we can see 
that the assignment probability function pi{xi,t) is defined as the assignment 
state function &i(xi) to the power a with normalization. 

With the bar notation for normalization introduced in the subsection 15.11 
the iterative update function (jTTJ) can be rewritten as 

$i( Xi ,t)=J2 \e^ la)/h ]l(M*i,t-l)) a ) . fori=l,2,...,n. (13) 

Without loss of generality, let the utility function m (x) for the agent i be 

u t (x) = e E ^' n . 

In this case, the agent i tries to maximize the utility function Uj(x) instead of 
maximizing the objective function Ei(x) where the former task is fully equiva- 
lent to the latter. Accordingly, the simple form (fT3")) of cooperative optimization 
becomes exactly same as the iterative update function for the action payoff 
function <?",; (xj , t) . The assignment probability function p t (xi , t) of agent i in (1 13 j) 
is called the strategy of player i in ([7]). 



Some Computational Properties of Cooperative Optimization In the 

simple form (| 13[) of cooperative optimization, we can replace the constant a by 
X(t)wij, where both X(t) and Wij are parameters, i.e., 

^..^E^^nftfe^ 1 ))^ • ( i4 ) 

Note that a summation operator can be approximated by a maximization 
operator as follows: 

X 

(Under the assumption that the function fix) has a unique global maximum.) 
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Such an approximation becomes accurate when h — > + , i.e., 



lim maxe /(l)/s 



£V<*)/»j =0 



With this approximation, the iterative update function (|14)) becomes 
S^t) =max L^^Yl^ix^t-l))^^ 
Taking the logarithm of the both sides, we have 



&i(xi,t) = max I Ei(x) + \{t) ^ >r, ,>!',(. r ,. I - 1) ) . (15) 



This is the original general form of cooperative optimization. 

In this form, each agent optimizes an objective function defined at the right 
side of the above equation. It is called the compromised objective function in the 
sense that it is the linear combination of the original objective function Ei(x) 
for agent i and the assignment state functions \Pj{xj,t — 1) of other agents j at 
the previous time instance t — 1. Given a variable value Xj, the function value 
Wi(xi,t) stores the maximal value of the compromised objective function with 
the i-th variable fixed to the value. 

Let Xi(t) be the value of Xj with the highest function value i.e., 

Xi(t) = arg max ^ (xi , t) . (16) 

That value represents the best value of Xi at iteration time instance t for maxi- 
mizing the compromised objective function defined at the right side of (I15|) . The 
solution of the system at iteration time instance t is the collection of those best 
values as follows 

(xi(t),x 2 (t),...,x n (t)), simply x(t) . 

All of the parameters toys together form anxn matrix called the propagation 
matrix W. To have J2i Ei(x) as the global utility function to be maximized, it is 
required that the propagation matrix W = (wij) nxn is non-negative, irreducible, 
aperiodic, and satisfying 

n 

^2wij = l, for j = 1,2, ...,n . 
i=i 

Theorem 3. Given a constant cooperation strength X of a non-negative value 
less than 1 (t) < A < 1), the general form U5\) of cooperative optimization has 
one and only one equilibrium. It always converges to the unique equilibrium with 
an exponential rate regardless of initial conditions. 
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To be more general, assume that the objective function Ei(x) for agent i is 
defined on variable set Xi. Recall that the solution at iteration t is x(t) (see 
([11])). Let x(t)(Xi) denote the restriction of the solution on X^. 

Definition 2. The solution x(t) is called a consensus solution if it is the optimal 
solution for each optimization problem defined by \15\) . That is, 

x(t)(Xi) = argmax I Ei(x) + A(i) Wij x I/j{xj,t — 1) , for i — 1, 2, . . . ,n. 

Theorem 4. If the general form \15\) of cooperative optimization converges to 
a consensus equilibrium with a constant A satisfying < A < 1, then it must be 
the global optimum of the global objective function E\{x) + E^ix) + • • ■ + E n {x). 

From (Unj), we can see that the agents can increase the chance of reaching a 
consensus when the value of the parameter A is increased. However, when A > 1, 
it is no longer guaranteed that any consensus equilibrium is the global optimum. 
Also, the uniqueness of equilibrium is no longer guaranteed. Assume that the 
maximization of Ei(x), for any i, also leads to the maximization of the global 
objective function E{x). Then, when A — > oo, the cooperative optimization (|15l) 
falls back to local search, a classic optimization method (see Section 3.5 in [S]). 
A local search algorithm can have many local optimal solutions and the number 
of them may grow exponentially with the problem size. 

In summary, the cooperative optimization algorithm (|15[) is absolutely stable 
when the cooperation strength A is less than one (A < 1). Above that value, the 
number of equilibria may grow with the value. As a consequence, the algorithm 
may become less stable because it can get stuck into one equilibrium or another. 
On the other hand, the chance of reaching a consensus equilibrium increases. 
A consensus equilibrium is guaranteed to be the global optimal one only when 
A < 1. Hence, the performance of the algorithm usually peaks at some positive 
value for the cooperation strength A. It deteriorates when the value is moved 
away from the best performing value, either further up or further down towards 
the value zero. 

The above investigations are not on a rigorous basis. The exact performance 
of the cooperative optimization algorithm p5[) in relationship with the cooper- 
ation strength A is an open question. 
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